A computational genomics pipeline for prokaryotic sequencing projects
نویسندگان
چکیده
MOTIVATION New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. RESULTS We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. AVAILABILITY AND IMPLEMENTATION The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.
منابع مشابه
In Silico Whole Genome Sequencer and Analyzer (iWGS): a Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies
The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The...
متن کاملPanWeb: A web interface for pan-genomic analysis
With increased production of genomic data since the advent of next-generation sequencing (NGS), there has been a need to develop new bioinformatics tools and areas, such as comparative genomics. In comparative genomics, the genetic material of an organism is directly compared to that of another organism to better understand biological species. Moreover, the exponentially growing number of depos...
متن کاملSubmission Type: Letter (Methods) in silico Whole Genome Sequencer & Analyzer (iWGS): a computational pipeline to guide the design and analysis of de novo genome sequencing studies
The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with small and streamlined genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The incre...
متن کاملin silico Whole Genome Sequencer & Analyzer (iWGS): a computational pipeline to guide the design and analysis of de novo genome sequencing studies
The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The...
متن کاملComputational Challenges of Next Generation Sequencing Pipelines Using Heterogeneous Systems
We are rapidly entering the era of genomics. The dramatic cost reduction of DNA sequencing due to the introduction of Next Generation Sequencing (NGS) techniques has resulted in an exponential growth of genetics data. The amount of data generated, and its associated processing into useful information, poses serious computational challenges. Here, we give a brief introduction of NGS, show a typi...
متن کامل